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(54) Title: SYSTEMS AND METHODS FOR DIGITAL DOCUMENT PROCESSING 

(57) Abstract: Display technologies that separate the underlying functionality of an application program from the graphical display 
process, thereby eliminating or reducing the application's need to control the device display and to provide graphical user interface 
tools and controls for the display. Additionally, such systems reduce or eliminate the need for an application program to be present on 
a processing system when displaying data created by or for that application program, such as a document or video stream. Thus it will 
be understood that in one aspect, the systems and method described herein can display content, including documents, video steams, 
or other content, and will provide the graphical user functions for viewing the displayed document, such as zoom, pan, or other such 
functions, without need for the underlying application to be present on the system that is displaying the content. The advantages over 
the prior art of the systems and methods described herein include the advantage of allowing different types of content from different 
application programs to be shown on the same display within the same work space. 
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1 Systems and Methods for Digital Document Processing 

2 Related Applications 

3 This application claims priority to earlier filed 

4 British Patent Application No. 0009129.8, filed 14 

5 April 2000, and US Patent Application Serial Number 

6 09/703,502 filed 31 October 2000, both having Majid 

7 Anwar as an inventor, the contents of which are 

8 hereby incorporated by reference. 

9 Field of the Invention 

10 The invention relates to data processing systems, 

11 and more particularly, to methods and systems for 

12 processing digital documents to generate an output 

13 representation of a source document as a visual 

14 display, a hardcopy, or in some other display 

15 format. 

16 Background 
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1 As used herein, the term "digital document" is used 

2 to describe a digital representation of any type of 

3 data processed by a data processing system which is 

4 intended, ultimately, to be output in some form, in 

5 whole or in part, to a human user, typically by 

6 being displayed or reproduced visually (e.g., by 

7 means of a visual display unit or printer) , or by 

8 text-to-speech conversion, etc. A digital document 

9 may include any features capable of representation, 

10 including but not limited to the following: text; 

11 graphical images; animated graphical images; full 

12 motion video images; interactive icons, buttons, 

13 menus or hyperlinks. A digital document may also 

14 include non-visual elements such as audio (sound) 

15 elements. 

16 Data processing systems, such as personal computer 

17 systems, are typically required to process "digital 

18 documents, " which may originate from any one of a 

19 number of local or remote sources and which may 

20 exist in any one of a wide variety of data formats 

21 ("file formats"). In order to generate an output 

22 version of the document, whether as a visual display 

23 or printed copy, for example, it is necessary for 

24 the computer system to interpret the original data 

25 file and to generate an output compatible with the 

26 relevant output device (e.g., monitor, or other 

27 visual display device or printer) . In general, this 

28 process will involve an application program adapted 

29 to interpret the data file, the operating system of 

30 the computer, a software "driver" specific to the 

31 desired output device and, in some cases 
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1 (particularly for monitors or other visual display 

2 units) , additional hardware in the form of an 

3 expansion card. 

4 This conventional approach to the processing of 

5 digital documents in order to generate an output is 

6 inefficient in terms of hardware resources, software 

7 overheads and processing time, and is completely 

8 unsuitable for low power, portable data processing 

9 systems, including wireless telecommunication 

10 systems, or for low cost data processing systems 

11 such as network terminals, etc. Other problems are 

12 encountered in conventional digital document 

13 processing systems, including the need to configure 

14 multiple system components (including both hardware 

15 and software components) to interact in the desired 

16 manner, and inconsistencies in the processing of 

17 identical source material by different systems 

18 (e.g., differences in formatting, color 

19 reproduction, etc.). In addition, the conventional 

20 approach to digital document processing is unable to 

21 exploit the commonality and/or re-usability of file 

22 format components. 

23 Summary of the Invention 

24 It is an object of the present invention to provide 

25 digital document processing methods and systems, and 

26 devices incorporating such methods and systems, 

27 which obviate or mitigate the aforesaid 

28 disadvantages of conventional methods and systems. 
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1 The systems and methods described herein provide a 

2 display technology that separates the underlying 

3 functionality of an application program from the 

4 graphical display process, thereby eliminating or 

5 reducing the application's need to control the 

6 device display and to provide graphical user 

7 interface tools and controls for the display. 

8 Additionally, such systems reduce or eliminate the 

9 need for an application program to be present on a 

10 processing system when displaying data created by or 

11 for that application program, such as a document or 

12 video stream. Thus it will be understood that in 

13 one aspect, the systems and methods described herein 

14 can display content, including documents, video 

15 streams, or other content, and will provide the 

16 graphical user functions for viewing the displayed 

17 document, such as zoom, pan, or other such 

18 functions, without need for the underlying 

19 application to be present on the system that is 

20 displaying the content. The advantages over the 

21 prior art of the systems and methods described 

22 herein include the advantage of allowing different 

23 types of content from different application programs 

24 to be shown on the same display within the same work 

25 space. Many more advantages will be apparent to 

26 those of ordinary skill in the art and those of 

27 those of ordinary skill in the art will also be able 

28 to see numerous way of employing the underlying 

29 technology of the invention for creating additional 

30 systems, devices, and applications. These modified 

31 systems and alternate systems and practices will be 
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1 understood to fall within the scope of the 

2 invention. 
3 

4 More particularly, the systems and methods described 

5 herein include a digital content processing system 

6 that comprises an application dispatcher for 

7 receiving an input byte stream representing source 

8 data in one of a plurality of predetermined data 

9 formats and for associating the input byte stream 

10 with one of the predetermined data formats. The 

11 system may also comprise a document agent for 

12 interpreting the input byte stream as a function of 

13 the associated predetermined data format and for 

14 parsing the input byte stream into a stream of 

15 document objects that provide an internal 

16 representation of primitive structures within the 

17 input byte stream. The systems also include a core 

18 document engine for converting the document objects 

19 into an internal representation data format and for 

20 mapping the internal representation data to a 

21 location on a display. A shape processor within the 

22 system processes the internal representation data to 

23 drive an output device to present the content as 

24 expressed through the internal representation. 
25 

26 Embodiments of the invention will now be described, 

27 by way of example only, with reference to the 

28 accompanying drawings. 



29 



Brief Description of the Drawings 
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1 The foregoing and other objects and advantages of 

2 the invention will be appreciated more fully from 

3 the following further description thereof, with 

4 reference to the accompanying drawings, wherein: 

5 Figure L is a block diagram illustrating an 

6 embodiment of a digital document processing system 

7 in accordance with the present invention. 

8 Figure 2 is a block diagram that presents in greater 

9 detail the system depicted in Figure 1; 



10 



Figure 3 is a flowchart diagram of one document 



11 agent; 

12 Figure 4 depicts schematically an exemplary document 

13 of the type that can be processed by the system of 

14 Figure 1; 

15 Figure 5 depicts flowchart diagrams of two 

16 exemplary processes employed to reduce redundancy 

17 within the internal representation of a document; 

18 and 

19 Figures 6-8 depict ah exemplary data structure for 
2 0 storing an internal representation of a processed 

21 source document. 

22 Detailed Description of Certain Illustrated 
2 3 Embodiments 



24 
25 



The systems and methods described herein include 
computer programs that operate to process an output 
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1 stream or output file generated by an application 

2 program for the purpose of presenting the output on 

3 an output device, such as a video display. The 

4 applications according to the invention can process 

5 these streams to create an internal representation 

6 of that output and can further process that internal 

7 representation to generate a new output stream that 

8 may be displayed on an output device as the output 

9 generated by the application according to the 

10 invention. Accordingly, the systems of the 

11 invention decouple the application program from the 

12 display process thus relieving the application 

13 program from having to display its output onto a 

14 particular display device and further removes the 

15 need to have the application program present when 

16 processing the output of that application for the 

17 purpose of displaying that output. 

18 To illustrate this operation, Figure 1 provides a 

19 high-level functional block diagram of a system 10 

20 that allows a plurality of application programs, 

21 shown collectively as element 13, to deliver their 

22 output streams to a computer process 8 that 

23 processes those output streams and generates a 

24 representation of the collective output created by 

25 those streams for display on the device 26. The 

26 collective output of the application programs 13 is 

27 depicted in Figure 1 by the output printer device 26 

28 that presents the output content generated by the 

29 different application programs 13. It will be 

3 0 understood by those of skill in the art the output 

31 device 2 6 is presenting output generated by the 
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1 computer process 8 and that this output collectively 

2 carries the content of the plural application 

3 programs 13 . In the illustration provided by 

4 Figure 1, the presented content comprises a 

5 plurality of images and the output device 26 is a 

6 display. However, it will be apparent to those of 

7 skill in the art that in other practices the content 

8 may be carried in a format other than images, such 

9 as auditory tactile, or any other format, or 

10 combination of formats suitable for conveying 

11 information to a user. Moreover, it will be 

12 understood by those of skill in the art that the 

13 type of output device 26 will vary according to the 

14 application and may include devices for presenting 

15 audio content, video content, printed content, 

16 plotted content or any other type of content. For 

17 the purpose of illustration, the systems and methods 

18 described herein will largely be shown as displaying 

19 graphical content through display devices, yet it 

20 will be understood that these exemplary systems are 

21 only for the purpose of illustration, and not to be 

22 understood as limiting in anyway. Thus the output 

23 generated by the application programs 13 is 

24 processed and aggregated by the computer process 8 

25 to create a single display that includes all the 

26 content generated by the individual application 

27 programs 13 . 

28 In the depicted embodiment, each of the 

29 representative outputs appearing on display 26 is 

30 termed a document, and each of the depicted 

31 documents can be associated with one of the 
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1 application programs 13. It will be understood that 

2 the term document as used herein will encompass 

3 documents, streamed video, streamed audio, web 

4 pages, and any other form of data that can be 

5 processed and displayed by the computer process 8. 

6 The computer process 8 generates a single output 

7 display that includes within that display one or 

8 more of the documents generated from the application 

9 programs 13 . The collection of displayed documents 

10 represents the content generated by the application 

11 programs 13 and this content is displayed within the 

12 program window generated by the computer process 8. 

13 The program window for the computer process 8 also 

14 may include a set of icons representative of tools 

15 provided with the graphical user interface and 

16 capable of allowing a user to control the operation, 

17 in this case the display, of the documents appearing 

18 in the program window. 

19 In contrast, the conventional approach of having 

20 each application program form its own display would 

21 result in. a presentation on the display device 26 

22 that included several program windows, typically one 

23 for each application program 13. Additionally, each 

24 different type of program window would include a 

25 different set of tools for manipulating the content 

26 displayed in that window. Thus the system 10 of the 

27 invention has the advantage of providing a 

2 8 consistent user interface, and only requiring 

29 knowledge of one set of tools for displaying and 

3 0 controlling the different documents. Additionally, 
31 the computer process 8 operates on the output of the 
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1- application programs 13, thus only requiring that 

2 output to create the documents that appear within 

3 the program window. Accordingly, it is not 

4 necessary that the application programs 13 be 

5 resident on the same machine as the process 8, nor 

6 that the application programs 13 operate in concert 

7 with the computer process 8. The computer process 8 

8 needs only the output from these application 

9 programs 13, and this output can be derived from 

10 stored data files that were created by the 

11 application programs 13 at an earlier time. 

12 However, the systems and methods described herein 

13 may be employed as part of systems wherein an 

14 application program is capable of presenting its own 

15 content, controlling at least a portion of the 

16 display 2 6 and presenting that content within a 

17 program window associated with that application 

18 program. In these embodiments the systems and 

19 methods of the invention can work as separate 

20 applications that appear on the display within a 

21 portion of the display provided for its use. 

22 More particularly, Figure 1 depicts a plurality of 

23 application programs 13. These application programs 

24 can include word processing programs such as Word, 

25 WordPerfect, or any other similar word processing 
2 6 program. It can further include programs such as 

27 Netscape Composer that generates HTML files, Adobe 

28 Acrobat that processes PDF files, a web server that 

29 delivers XML or HTML, a streaming server that 

30 generates a stream of audio-visual data, an e-mail 

31 client or server, a database, spreadsheet or any 
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1 other kind of application program that delivers 

2 output either as a file, data stream, or in some 

3 other format suitable for use by a computer process. 

4 In the embodiment of Figure 1 each of the 

5 application programs 13 presents its output content 

6 to the computer process 8. In operation this can 

7 occur by having the application process 13 direct 

8 its output stream as an input byte stream to the 

9 computer process 8. The use of data streams is 

10 well known to those of ordinary skill in the art and 

11 described in the literature, including for example, 

12 Stephen G. Kochan, Programming in C, Hayden 
13. Publishing (1983).. Optionally, the application 

14 program 13 can create a data file such as a Word 

15 document, that can be streamed into the computer 

16 process 8 either by a separate application or by the 

17 computer process 8. 

18 The computer process 8 is capable of processing the 

19 various input streams to create the aggregated 

20 display shown on display device 26. To this end, 

21 and as will be shown in greater detail hereinafter, 

22 the computer process 8 processes the incoming 

23 streams to generate an internal representation of 

24 each of these input streams. In one practice this 

25 internal representation is meant to look as close as 

2 6 possible to the output stream of the respective 

27 application program 13. However, in other 

28 embodiments the internal representation may be 

29 created to have a selected, simplified or partial 

3 0 likeness to the output stream generated by the 

31 respective application program 13. Additionally and 
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1 optionally, the systems and methods described herein 

2 may also apply filters to the content being 

3 translated thereby allowing certain portions of the 

4 content to be removed from the content displayed or 

5 otherwise presented. Further, the systems and 

6 methods described herein may allow alteration of the 

7 structure of the source document, allowing for 

8 repositioning content within a document, rearranging 

9 the structure of the document, or selecting only 

10 certain types of data. Similarly in an optional 

11 embodiment, content can be added during the 

12 translation process, including active content such 

13 as links to web sites. In either case, the internal 

14 representation created by computer process 8 may be 

15 further processed by the computer process 8 to drive 

16 the display device 2 6 to create the aggregated image 

17 represented in Figure 1. 

18 Turning to Figure 2, a more detailed representation 

19 of the system of Figure 1 is presented. 

20 Specifically, Figure 2 depicts the system 10 which 

21 includes that computer process 8, the source 

22 documents 11, a and a display device 26. The 

23 computer process 8 includes a plurality of document 

24 agents 12, an internal representation format file 

25 and process 14, buffer storage 15, a library of 

2 6 generic objects 16, a core document engine that in 

27 this embodiment comprises a parsing module 18, and a 

28 rendering module 19, an internal view 20, a shape 

29 processor 22 and a final output 24. Figure 2 

3 0 further depicts an optional input device 30 for 

31 transmitting user input 40 to the computer process 
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1 8. The depicted embodiment includes a process 8 

2 that comprises a shape processor 22. However, it 

3 will be apparent to those of ordinary skill in the 

4 art, that the depicted process 8 is only exemplary 

5 and that the process 8 may be realized through 

6 alternate processes and architectures. For example, 

7 the shape processor 22 may optionally be realized as 

8 a hardware component, such as a semiconductor 

9 device, that supports the operation of the other 

10 elements of the process 8. Moreover, it will be 

11 understood that although Figure 2 presents process 8 

12 as a functional block diagram that comprises a 

13 single system, it may be that process 8 is 

14 distributed across a number of different platforms, 

15 and optionally it may be that the elements operate 

16 at different times and that the output from one 

17 element of process 8 is delivered at a later time as 

18 input to the next element of process 8. 

19 As discussed above, each source document 11 is 

20 associated with a document agent 12 that is capable 

21 of translating the incoming document into an 

22 internal representation of the content of that 

23 source document 11. To identify the appropriate 

24 document agent 12 to process a source document 11, 

25 the system 10 of Figure 1 includes an application 

26 dispatcher (not shown) that controls the interface 

27 between application programs and the system 10. In 

28 one practice, the use of an external application 

29 programming interface (API) is handled by the 

30 application dispatcher which passes data, calls the 

31 appropriate document agent 12, or otherwise carries 
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1 out the request made by the application program. To 

2 select the appropriate document agent 12 for a 

3 particular source document 11, the application 

4 dispatcher advertises the source document 11 to all 

5 the loaded document agents 12. These document 

6 agents 12 then respond with information regarding 

7 their particular suitability for translating the 

8 content, of the published source document 11. Once 

9 the document agents 12 have responded, the 

10 application dispatcher selects a document agent 12 

11 and passes a pointer, such as a URI of the source 

12 document 11, to the selected document agent 12. 

13 In one practice, the computer process 8 may be run 

14 as a service under which a plurality of threads may 

15 be created thereby supporting multi-processing of 

16 plural document sources 11. In other embodiments, 

17 the process 8 does not support multi- threading and 

18 the document agent 12 selected by the application 

19 dispatcher will be called in the current thread. 

20 It will be understood that the exemplary embodiment 

21 of Figure 2 provides a flexible and extensible front 

22 end for processing incoming data streams of 

23 different file formats. For example, optionally, 

24 if the application dispatcher determines that the 

25 system lacks a document agent 12 suitable for 

26 translating the source document 11, the application 

27 dispatcher can signal the respective application 

28 program 13 indicating that the source document 11 is 

29 in an unrecognized format. Optionally, the 

30 application program 13 may choose to allow the 
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1 reformatting of the source document 11, such as by 

2 converting the source document 11 produced by the 

3 application program 13 from its present format into 

4 another format supported by that application program 

5 13. For example an application program 13 may 

6 determine that the source document 11 needs to be 

7 saved in a different format, such as an earlier 

8 version of the file format. To the extent that the 

9 application program 13 supports that format, the 

10 application program 13 can resave the source 

11 document 11 in this supported format in order that a 

12 document agent 12 provided by the system 10 will be 

13 capable of translating the source document 11. 

14 Optionally, the application dispatcher, upon 

15 detecting that the system 10 lacks a suitable 

16 document agent 12, can indicate to a user that a new 

17 document agent of a particular type may be needed 

18 for translating the present source document 11. To 

19 this end, the computer process 8 may indicate to the 

20 user that a new document agent needs to be loaded 

21 into the system 10 and may direct the user to a 

22 location, such as a web site, from where the new 

23 document agent 12 may be downloaded. Optionally, 

24 the system could fetch automatically the document 

25 agent without asking the user, or could identify a 

26 generic agent 12, such as a generic text agent that 

27 can extract portions of the source document 11 

28 representative of text. Further, agents that prompt 

29 a user for input and instruction during the 

30 translation process may also be provided. 
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1 In a still further optional embodiment, an 

2 application dispatcher in conjunction with the 

3 document agents 12 acts as an input module that 

4 identifies the file format of the source document 11 

5 on the basis of any one of a variety of criteria, 

6 such as an explicit file-type identification within 

7 the document, from the file name, including the file 

8 name extension, or from known characteristics of the 

9 content of particular file types. The bytestream is 

10 input to the document agent 12, specific to the file 

11 format of the source document 11. 

12 Although the above description has discussed input 

13 data being provided by a stream or computer file, it 

14 shall be understood by those of skill in the art 

15 that the system 10 may also be applied to input 

16 received from an input device such as a digital 

17 camera or scanner as well as from an application 

18 program that can directly stream its output to the 

19 process 8, or that has its output streamed by an 

20 operating system to the process 8. In this case the 

21 input bytestream may originate directly from the 

22 input device, rather from a source document 11. 

23 However, the input bytestream will still be in a 

24 data format suitable for processing by the system 10 

25 and, for the purposes of the invention, input 

26 received from such an input device may be regarded 

27 as a source document 11. 

28 As shown in Figure 2, the document agent 12 employs 

29 the library 16 of standard objects to generate the 

30 internal representation 14, which describes the 
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1 content of the source document in terms of a 

2 collection of document objects whose generic types 

3 are as defined in the library 16, together with 

4 parameters defining the properties of specific 

5 instances of the various document objects within the 

6 document. Thus, the library 16 provides a set of 

7 types of objects which the document agents 12, the 

8 parser 18 and the system 10 have knowledge of. For 

9 example, the document objects employed in the 

10 internal representation 14 may include: text, 

11 bitmap graphics and vector graphics document objects 

12 which may or may not be animated and which may be 

13 two- or three-dimensional: video, audio and a 

14 variety of types of interactive objects such as 

15 buttons and icons. Vector graphics document objects 

16 may be PostScript-like paths with specified fill and 

17 transparency. Bitmap graphic document objects may 

18 include a set of sub-object types such as for 

19 example JPEG, GIF and PNG object types. Text 

20 document objects may declare a region of stylized 

21 text. The region may include a .paragraph of text, 

22 typically understood as a set of characters that 

23 appears between two delimiters, like a pair of 

24 carriage returns. Each text object may include a 

25 run of characters and the styling information for 

26 that character run including one or more associated 

27 typefaces, points and other such styling 

28 information. 

29 The parameters defining specific instances of 

30 document objects will generally include dimensional 

31 co-ordinates defining the physical shape, size and 
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1 location of the document object and any relevant 

2 temporal data for defining document objects whose 

3 properties vary with time, thereby allowing the 

4 system to deal with dynamic document structures 

5 and/or display functions. For example, a stream of 

6 video input may be treated by the system 10 as a 

7 series of figures that are changing at a rate of, 

8 for example, 30 frames per second. In this case the 

9 temporal characteristic of this figure object 

10 indicates that the figure object is to be updated 30 

11 times per second. As discussed above, for text 

12 objects, the parameters will normally also include a 

13 font and size to be applied to a character string. 

14 Object parameters may also define other properties, 

15 such as transparency. It will be understood that 

16 the internal representation may be saved/ stored in a 

17 file format native to the system and that the range 

18 of possible source documents 11 input to the system 

19 10 may include documents in the system's native file 

20 format. It is also possible for the internal 

21 representation 14 to be converted into any of a 

22 range of other file formats if required, using 

23 suitable conversion agents. 

24 Figure 3 depicts a flow chart diagram of one 

25 exemplary process that may be carried out by a 

26 document agent 12. Specifically, Figure 3 depicts a 

27 process 50 that represents the operation of an 

28 example document agent 12, in this case a document 

29 agent 12 suitable for translating the contents of a 
3 0 Microsoft Word document into an internal 

31 representation format. Specifically, the process 50 
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1 includes an initialization step 52 wherein the 

2 process 50 initializes the data structures, memory 

3 space, and other resources that the process 50 will 

4 employ while translating the source document 11. 

5 After step 52 the process 50 proceeds to a series of 

6 steps, 54, 58 and 60, wherein the source document 11 

7 is analyzed and divided into subsections. In the 

8 process 50 depicted in Figure 3 steps 54, 58 and 60, 

9 subdivide the source document 11 as it is streamed 

10 into the document agent 12 first into sections, then 

11 subdivides the sections into paragraphs and then 

12 subdivides paragraphs into the individual characters 

13 that make up that paragraph. The sections, 

14 paragraphs and characters identified within the 

15 source document 11 may be identified within a piece 

16 table that contains pointers to the different 

17 subsections identified within the source document 

18 11. It will be understood by those of skill in the 

19 art that the piece table depicted in Figure 3 

20 represents a construct employed by MSWord for 

21 providing pointers to different subsections of a 

22 document. It will further be understood that the 

23 use of a piece table or a piece table like construct 

24 is optional and depends on the application at hand, 

25 including depending on the type of document being 

26 processed. 

27 As the process 50 in step 60 begins to identify 

28 different characters that appear within a particular 

29 paragraph, the process 60 may proceed to step 62 

3 0 wherein a style is applied to the character or set 

31 of characters identified in step 60. The 
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1 application of a style is understood to associated 

2 the identified characters with a style of 

3 presentation that is being employed with those 

4 characters. The style of presentation may include 

5 properties associated with the character including 

6 font type, font size, whether the characters are 

7 bold, italic, or otherwise stylized. Additionally, 

8 in step 62 the process can determine whether the 

9 characters are rotated, or being positioned for 

10 following a curved path or other shape. 

11 Additionally, in step 62 style associated with the 

12 paragraph in which the characters occur may also be 

13 identified and associated with the characters. Such 

14 properties can include the line spacing associated 

15 with the paragraph, the margins associated with the 

16 paragraph, the spacing between characters, and other 

17 such properties. 

18 After step 62 the process 50 proceeds to step 70 

19 wherein the internal representation is built up. 

20 The object which describes the structure of the 

21 document is created in Step 64 as an object within 

22 the internal representation, and the associated 

23 style of this object, together with the character 

24 run it contains, is created separately within the 

25 internal representation at Step 68. Figures 6, 7 

26 and 8, which will be explained in more detail herein 

27 after, depict figuratively the file structure 

28 created by the process 50 wherein the structure of a 

29 document is captured by a group of document objects 

30 and the data associated with the document objects is 

31 stored in a separate data structure. After step 70, 
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1 a process 50 proceeds to decision block 72 wherein 

2 the process 50 determines whether the paragraph 

3 associated with the last processed character is 

4 complete. If the paragraph is not complete the 

5 process 50 returns to step 60 wherein the next 

6 character from the paragraph is read. 

7 Alternatively, if the paragraph is complete the 

8 process 50 proceeds to decision block 74 wherein the 

9 process 50 determines whether the section is 

10 complete. If the section is complete the process 

11 returns to step 58 and the next paragraph is read 

12 from the piece table. Alternatively if the section 

13 is complete the process 50 proceeds to step 54 

14 wherein the next section, if there is a next section 

15 is read from the piece table and processing 

16 continues. Once the document has been processed the 

17 system 8 can transmit, save, export or otherwise 

18 store the translated document for subsequent use. 

19 The system can store the translated file in a format 

20 compatible with the internal representation, and 

21 optionally in other formats as well including 

22 formats compatible with the file formats of the 

23 source documents 11 (for which it may employ x export 

24 document agents' not shown capable of receiving 

25 internal representation data and creating source 
2 6 document data) , or in a binary form, a textual 

27 document description structure, marked-up text or in 

28 any other suitable format; and may employ a 

29 universal text encoding model, including Unicode, 

30 shif tmapping, big-5, and a luminance/chrominance 

31 model. 
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1 As can be seen from the above, the format of the 

2 internal representation 14 separates the "structure" 

3 (or "layout") of the documents, as described by the 

4 object types and their parameters, from the 

5 "content" of the various objects; e.g. the character 

6 string (content) of a text object is separated from 

7 the dimensional parameters of the object; the image 

8 data (content) of a graphic object is separated from 

9 its dimensional parameters. This allows document 

10 structures to be defined in a compact manner and 

11 provides the option for content data to be stored 

12 remotely and to be fetched by the system only when 

13 needed. The internal representation 14 describes 

14 the document and its constituent objects in terms of 

15 "high-level" descriptions. 

16 The document agent 12 described above with reference 

17 to Figure 3 is capable of processing a data file 

18 created by the MSWord word processing application 

19 and translating that data file into an internal 

20 representation that is formed from a set of object 

21 types selected from the library 16, that represents 

22 the content of the processed document. Accordingly, 

23 the document agent 12 analyzes the Word document and 

24 translates the structure and content of that 

25 document into an internal representation known to 

26 the computer process 8. One example of one type of 

27 Word document that may be processed by the document 

28 agent 12 is depicted in Figure 4. Specifically, 

29 Figure 4 depicts a Word document 32 of the type 

30 created by the MSWord application program. The 

31 depicted document 32 comprises one page of 
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1 information wherein that one page includes two 

2 columns of text 34 and one figure 36. Figure 4 

3 further depicts that the columns of text 34 and the 

4 figure 36 are positioned on the page 38 in such a 

5 way that one column of text runs from the top of the 

6 page 38 to the bottom of the page 3 8 and the second 

7 column of text runs from about the center of the 

8 page to the bottom of the page with the figure 3 6 

9 being disposed above the second column of text 34. 

10 As discussed above with reference to Figure 3 the 

11 document agent 12 begins processing the document 32 

12 by determining that the document 32 comprises one 

13 page and contains a plurality of different objects. 

14 For the one page found by the document agent 12, the 

15 document agent 12 identifies the style of the page, 

16 which for example may be a page style of an 8.5 x 11 

17 page in portrait format. The page style identified 

18 by the document agent 12 is embodied in the internal 

19 representation for later use by the parser 18 in 

20 formatting and flowing text into the document 

21 created by the process 8. 

22 For the document 32 depicted in Figure 4 only one 

23 page is present. However, it will be understood 

24 that the document agent 12 may process Word 

25 documents comprising a plurality of pages. In such 

26 a case the document agent 12 would process each page 

27 separately by creating a page then filling it with 

28 objects of the type found in the library. Thus page 

29 style information can include that a document 

30 comprises a plurality of pages and that the pages 
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1 are of a certain size. Other page style information 

2 may be identified by the document agent 12 and the 

3 page style information identified can vary according 

4 to the application. Thus different page style 

5 information may be identified by a document agent 

6 capable of processing a Microsoft Excel document or 

7 a real media data stream. 

8 As further described with reference to Figure 3- 4 

9 once the document agent 12 has identified the page 

10 style the document agent 12 may begin to break the 

11 document 32 down into objects that can be mapped to 

12 document objects known to the system and typically 

13 stored in the library 16. For example, the document 

14 agent 12 may process the document 32 to find text 

15 objects, bitmap objects and vector graphic objects. 

16 Other type of object types may optionally be 

17 provided including video type, animation type, 

18 button type, and script type. In this practice, the 

19 document agent 12 will identify a text object 34 

20 whose associated style has two columns. The 

21 paragraphs of text that occur within the text object 

22 34 may be analyzed for identifying each character in 

23 each respective paragraph. Process 50 may apply 

24 style properties to each identified character run 

25 and each character run identified within the 

26 document 32 may be mapped to a text object of the 

27 type listed within the library 16. Each character 

2 8 run and the applied style can be understood as an 

29 object identified by the document agent 12 as having 

3 0 been found within the document 32 and having been 

31 translated to a document object, in this case a text 
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1 object of the type listed within the library 16. 

2 This internal representation object may be streamed 

3 from the document agent 12 into the internal 

4 representation 14. The document agent 12 may 

5 continue to translate the objects that appear within 

6 the document 32 into document objects that are known 

7 to the system 10 until each object has been 

8 translated. The object types may be appropriate for 

9 the application and may include object types 

10 suitable for translating source data representative 

11 of a digital document, an audio/visual presentation, 

12 a music file, an interactive script, a user 

13 interface file and an image file, as well as any 

14 other file types. 

15 Turning to Figure 5, it can be seen that the 

16 process 80 depicted in Figure 5 allows for 

17 compacting similar objects appearing within the 

18 internal representation of the source document 11, 

19 for the purpose of reducing the size of the internal 

20 representation. For example, Figure 5 depicts a 

21 process 80 wherein step 82 has a primitive library 

22 object A being processed by, in step 84, inserting 

23 that primitive object into the document that is 

24 becoming the internal representation of the source 

25 document 11. In step 88 another object B, provided 

26 by the document agent 12 is delivered to the 

27 . internal representation file process 14. The 

28 process 80 then undertakes the depicted sequence of 

29 steps 92 through 98 wherein characteristics of 

30 object A are compared to the characteristics of 

31 object B to determine if the two objects have the 
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1 same characteristics. For example, if object A and 

2 object B represent two characters such as the letter 

3 P and the letter N, if both characters P and N are 

4 the same color, same font, same size and the same 

5 style such as bold or italicized, then the process 

6 80 in step 94 joins the two objects together within 

7 one object classification stored within the internal 

8 representation. If these characteristics do not 

9 match then the process 80 adds them to the internal 

10 representation as two separate objects. 

11 Figure 5 depicts a process 80 wherein the internal 

12 representation file 14 compacts the objects as a 

13 function of the similarity of physically adjacent 

14 objects. Those of ordinary skill in the art will 

15 understand that this is merely one process for 

16 compacting the objects and that other techniques may 

17 be employed. For example, in an optional practice, 

18 the compaction process may comprise a process for 

19 compacting objects that are visually adjacent. 

20 Figures 6, 7 and 8 depict the structure of the 

21 internal representation of a document that has been 

22 processed by the system depicted in Figures 1 and 2 . 

23 The internal representation of the document may be 

24 embodied as a computer file or as data stored in 

25 core memory. However, it will be apparent to those 

26 of ordinary skill in the art that data structure 

27 selected for capturing or transporting the internal 

28 representation may vary according to the application 

29 and any suitable data structure may be employed with 
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the systems and methods described herein without 
departing from the scope of the invention. 

As will be described in greater detail hereinafter 
the structure of the internal representation of the 
processed document separates the structure of the 
document from the content of the document. 
Specifically, the structure of the document is 
captured by a data structure that shows the 
different document objects that make up the 
document, as well as the way that these document 
objects are arranged relative to each other. This 
separation of structure from content is shown in 
Figure 6 wherein the data structure 110 captures the 
structure of the document being processed and stores 
that structure in a data format that is independent 
of the actual content associated with that document. 
Specifically, the data structure 110 includes a 
resource Table 112 and a document structure 114. 
The resource table 112 provides a list of resources 
for constructing the internal representation of the 
document. For example the resource table 112 can 
include one or more tables of common structures that 
occur within the document, such as type faces, 
links, and color lists. These common structures may 
be referenced numerically within the resource table 
112. The resources of resource table 112 relate to 
the document objects that are arranged within the 
document structure 114. As Figure 6 shows, the 
document structure 114 includes a plurality of 
containers 118 that are represented by the sets of 
the nested parentheses. Within the containers 118 
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1 are a plurality of document objects 120. As shown 

2 in Figure 6 the containers 118 represent collections 

3 of document objects that appear within the document 

4 being processed.. As further shown by Figure 6 the 

5 containers 118 are also capable of holding sub- 

6 containers. For example, the document structure 114 

7 includes one top-level container, identified by the 

8 set of outer parentheses labeled 1, and has three 

9 nested containers 2, 3 and 4. Additionally, the 

10 container 4 is double nested within container 1 and 

11 container 3 . 

12 Each container 118 represents features within a 

13 document, wherein the features may be a collection 

14 of individual document objects, such as the depicted 

15 document objects 120. Thus for example, a document, 

16 such as the document 32 depicted in Figure 4, may 

17 include a container representative of the character 

18 run wherein the character run includes the text that 

19 appears within the columns 34. The different 

20 document objects 120 that occur within the character 

21 run container may, for example, be representative of 

22 the different paragraphs that occur within that 

23 character run. The character run container has a 

24 style associated with it. For example, the 

25 character run depicted in Figure 4 can include style 

26 information representative of the character font 

27 type, font size, styling, such as bold or italic 

28 styling, and style information representative of the 

29 size of the column, including width and length, in 

30 which the character run, or at least a portion of 

31 that character run, occurs. This style information 
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1 may be later used by the parser 18 to reformat and 

2 reflow the text within the context specific view 20. 

3 Another example of a container may be a table that, 

4 for example, could appear within a column 34 of text 

5 in document 32. The table may be a container with 

6 objects. The other types and uses of containers 

7 will vary according to the application at hand and 

8 the systems and methods of the invention are not 

9 limited to any particular set of object types or 

10 containers. 

11 Thus, as the document agent 12 translates the source 

12 document 11, it will encounter objects that are of 

13 known object types, and the document agent 16 will 

14 request the library 16 to create an object of the 

15 appropriate object type. The document agent 12 will 

16 then lodge that created document object into the 

17 appropriate location within document structure 114 

18 to preserve the overall structure of the source 

19 document 11. For example, as the document agent 12 

20 encounters the image 36 within the source document 

21 11, the document agent 12 will recognize the image 

22 36, which may for example be a JPEG image, as an 

23 object of type bitmap, and optionally sub-type JPEG. 

24 This document agent 12, as shown in steps 64 and 68 

25 of Figure 3, can create the appropriate document 

26 object 120 and can lodge the created document object 

27 120 into the structure 114. Additionally, the data 

28 for the JPEG image document object 120, or in 

29 another example, the data for the characters and 

30 their associated style for a character run, may be 
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1 stored within the data structure 150 depicted in 

2 Figure 8. 

3 As the source document 11 is being processed, the 

4 document agent 12 may identify other containers 

5 wherein these other containers may be representative 

6 of a subfeature appearing within an existing 

7 container, such as a character run. For example, 

8 these subfeatures may include links to referenced 

9 material, or clipped visual regions or features that 

10 appear within the document and that contain 

11 collections of individual document objects 120. The 

12 document agent 12 can place these document objects 

13 120 within a separate container that will be nested 

14 within the existing container. The arrangement of 

15 these document objects 120 and the containers 118 

16 are shown in Figure 7 A as a tree structure 130 

17 wherein the individual containers 1, 2, 3 and 4 are 

18 shown as container objects 132, 134, 138 and 140 

19 respectively. The containers 118 and the document 

20 objects 120 are arranged in a tree structure that 

21 shows the nested container structure of documents 

22 structure 114 and the different document objects 120 

23 that occur within the containers 118. The tree 

24 structure of Figure 7A also illustrates that the 

25 structure 114 records and preserves the structure of 
2 6 the source document 11, showing the source document 

27 as a hierarchy of document objects 120, wherein the 

28 document objects 120 include the style information, 

29 such as for example the size of columns in which a 

30 run of characters appears, or temporal information, 

31 such as the frame rate for streamed content. Thus, 
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1 
2 
3 
4 
5 
6 



each document's graphical structure is described by 
a series of parameterized elements. One example of 
this is presented below in Table 1 . 



TABLE 1 



parameters 


e-g 


Type 


Bitmap 


Bounding Box 


400,200; 600,700 units 
(bottom left, top right) 


Fill 


Object 17 


Alpha 


0 (none) 


Shape 


Object 24 


Time 


0,-1 (infinity) [start, end] 



7 

8 As can be seen, Table 1 presents an example of 

9 parameters that may be used to describe a document's 

10 graphical structure. Table one presents examples of 

11 such parameters, such as the object type, which in 

12 this case is a Bitmap object type. A bounding box 

13 parameter is provided and gives the location of the 

14 document object within the source document 11. 

15 Table one further provides the Fill employed and an 

16 alpha factor that is representative of the degree of 

17 transparency for the object. A Shape parameter 

18 provides a handle to the shape of the object, which 

19 in this case could be a path that defines the 

20 outline of the object, including irregularly shaped 

21 objects. Table 1 also presents a time parameter 

22 representative of the temporal changing for that 

23 object. In this example, the image is stable and 

24 does not change with time. However, if the image 
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1 object presented streamed media, then this parameter 

2 could contain a temporal characteristic that 

3 indicates the rate at which the object should 

4 change, such as a rate comparable to the desired 

5 frame rate for the content. 
6 

7 Thus, the structural elements are containers with 

8 flowable data content, with this flowable data held 

9 separately and referenced by a handle from the 

10 container. In this way, any or all data content can 

11 be held remotely from the document structure. This 

12 allows for rendering of the document in a manner 

13 that can be achieved with a mixture of locally held 

14 and remotely held data content. Additionally, this 

15 data structure allows for rapid progressive 

16 rendering of the internal representation of the 

17 source document 11, as the broader and higher level 

18 objects can be rendered first, and the finer 

19 features can be rendered in subsequent order. Thus, 

20 the separate structure and data allows visual 

21 document to be rendered while streaming data to 

22 "fill" the content. Additionally, the separation of 

23 content and structure allows the content of the 

24 document to readily be edited or changed. As the 

25 document structure is independent from the content, 

26 different content can be substituted into the 

27 document structure. This can be done on container 

28 by container basis or for the whole document. The 

29 structure of the document can be delivered 

30 separately from the content and the content provided 

31 later, or made present on the platform to which the 
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1 structure is delivered. 
2 

3 Additionally, Figure 7A shows that the structure of 

4 a source document 11 can be represented as a tree 

5 structure 130. In one practice the tree structure 

6 may be modified and edited to change the 

7 presentation of the source document 11 . For 

8 example, the tree structure may be modified to add 

9 additional structure and content to the tree 13 0. 

10 This is depicted in Figure 7B that shows the 

11 original tree structure of Figure 7A duplicated and 

12 presented under a higher level container. Thus, 

13 Figure 7B shows that a new document structure, and 

14 therefore new representation, may be created by 

15 processing the tree structure 130 produced by the 

16 document agent 12. This allows the visual position 

17 of objects within a document to change, while the 

18 relative position of different objects 120 may 

19 remain the same. By adjusting the tree structure 

20 130, the systems described herein can edit and 

21 modify content. For example, in those applications 

22 where the content within the tree structure 13 0 is 

23 representative of visual content, the systems 

24 described herein can edit the tree structure to 

25 duplicate the image of the document, and present 

26 side by side images of the document. Alternatively, 

27 the tree structure 130 can be edited and 

28 supplemented to add additional visual information, 

29 such as by adding the image of a new document or a 

30 portion of that document. Moreover, by controlling 

31 the rate at which the tree structure is changed, the 

32 systems described herein can create the illusion of 
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1 a document gradually changing, such as sliding 

2 across a display, such as display device 26, or 

3 gradually changing into a new document. Other 

4 effects, such as the creation of thumbnail views and 

5 other similar results can be achieved and those of 

6 ordinary skill by making modifications to the 

7 systems and methods described herein and such 

8 modified systems and methods will fall within the 

9 scope of the invention. 
10 

11 The data of the source document 11 is stored 

12 separately from the structure 114. To this end, 

13 each document object 120 includes a pointer to the 

14 data associated with that object and this 

15 information may be arranged within an indirection 

16 list such as the indirection list 160 depicted in 

17 Figure 8. In this practice, and as shown in Figure 

18 8, each document object 120 is numbered and an 

19 indirection list 152 is created wherein each 

2 0 document object number 154 is associated with an 

21 offset value 158. For example the document object 

22 number 1, identified by reference number 160, may be 

23 associated with the offset 700, identified by 

24 reference number 162. Thus, the indirection list 

25 associates the object number 1 with the offset 700. 

26 The offset 700 may represent a location in core 

27 memory, or a file offset, wherein the data 

28 associated with object 1 may reside. As further 

29 shown in Figure 8 a data structure 150 may be 

3 0 present wherein the data that is representative of 

31 the content associated with a respective document 

32 object 120 may be stored. Thus for example, the 
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1 depicted object 1 at jump location 700 may include 

2 the Unicode characters representative of the 

3 characters that occur within the character run of 

4 the container 1 depicted in Figure 6. Similarly, 

5 the object 2 data, depicted in Figure 8 by reference 

6 number 172, and associated with in core memory 

7 location 810, identified by reference numeral 170, 

8 may be representative of the JPEG bit map associated 

9 with a bit map document object 120 referenced within 

10 the document structure 114 of Figure 6. 

11 It will be noted by those of skill in the art, that 

12 as the data is separated from the structure, the 

13 content for a source document is held in a 

14 centralized repository. As such, the systems 

15 described herein allow for compressing across 

16 different types of data objects. Such processes 

17 provide for greater storage flexibility in limited 

18 resource systems. 

19 Returning to Figure 2, it will be understood that 

2 0 once the process for compacting the content of an 

21 internal representation file completes compacting 

22 different objects, these objects are passed to the 

23 parser 18. The parser 18 parses the objects 

24 identified in the structure section of the internal 

25 representation, and with reference to the data 

26 content associated with this object, it re-applies 

27 the position and styling information to each object. 

28 The renderer 19 generates a context-specific 

29 representation or "view" 20 of the documents 

3 0 represented by the internal representation 14. The 
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1 required view may be of the all the documents, a 

2 whole document or of parts of one or some of the 

3 documents. The renderer 19 receives view control 

4 inputs 4 0 which define the viewing context and any 

5 related temporal parameters of the specific document 

6 view which is to be generated. For example, the 

7 system 10 may be required to generate a zoomed view 

8 of part of a document, and then to pan or scroll the 

9 zoomed view to display adjacent portions of the 

10 document. The view control inputs 40 are 

11 interpreted by the renderer 19 to determine which 

12 parts of the internal representation are required 

13 for a particular view and how, when and for how long 

14 the view is to be displayed. 

15 The context-specific representation/view 20 is 

16 expressed in terms of primitive shapes and 

17 parameters. 

18 The renderer 19 may also perform additional pre- 

19 processing functions on the relevant parts of the 

20 internal representation 14 when generating the 

21 required view 20 of the source document 11. The view 

22 representation 20 is input to a shape processor 22 

23 for processing to generate an output in a format 

24 suitable fore driving an output device 26, such as a 

25 display device or printer. 

26 The pre-processing functions of the renderer 19 may 

27 include colour correction, resolution 

28 adjustment /enhancement and anti-aliasing. 

29 Resolution enhancement may comprise scaling 
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1 functions which preserve the legibility of the 

2 content of objects when displayed or reproduced by 

3 the target output device. Resolution adjustment may 

4 be context-sensitive; e.g. the display resolution of 

5 particular objects may be reduced while the 

6 displayed document view is being panned or scrolled 

7 and increased when the document view is static. 

8 Optionally, there may be a feedback path 42 between 

9 the parser 18 and the internal representation 14, 

10 e.g. for the purpose of triggering an update of the 

11 content of the internal representation 14, such as 

12 in the case where the source document 11 represented 

13 by the internal representation comprises a multi- 

14 frame animation. 

15 The output from the renderer 19 expresses the 

16 document in terms of primitive objects. For each 

17 document object, the representation from the 

18 renderer 19 defines the object at least in terms of 

19 a physical, rectangle boundary box, the actual 

20 outline path of the object bounded by the boundary 

21 box, the data content of the object, and its 

22 transparency. 

23 The shape processor 22 interprets the primitive 

24 object and converts it into an output frame format 
2 5 appropriate to the target output device 26; e.g. a 

26 dot-map for a printer, vector instruction set for a 

27 plotter, or bitmap for a display device. An output 

28 control input 44 to the shape processor 22 provides 
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1 information to the shape processor 22 to generate 

2 output suitable for a particular output device 26. 

3 The shape processor 22 preferably processes the 

4 objects defined by the view representation 2 0 in 

5 terms of "shape" (i.e. the outline shape of the 

6 object), "fill" (the data content of the object) and 

7 "alpha" (the transparency of the object), performs 

8 scaling and clipping appropriate to the required 

9 view and output device, and expresses the object in 

10 terms appropriate to the output device (typically in 

11 terms of pixels by scan conversion or the like, for 

12 most types of display device or printer) . The shape 

13 processor 22 optionally includes an edge buffer 

14 which defines the shape of an object in terms of 

15 scan-converted pixels, and preferably applies anti- 

16 aliasing to the outline shape. Anti-aliasing may be 

17 performed in a manner determined by the 

18 characteristics of the output device 26, by applying 

19 a grey-scale ramp across the object boundary. This 

20 approach enables memory efficient shape-clipping and 

21 shape-intersection processes, and is memory 

22 efficient and processor efficient as well. A look-up 

23 table, or other technique, may be employed to define 

24 multiple tone response curves, allowing non-linear 

25 rendering control. The individual primitive objects 

26 processed by the shape processor 22 are combined in 

27 the composite output frame. The design of one 

28 shape processor suitable for use with the systems 

29 described herein is shown in greater detail in the 

30 patent application entitled Shape Processor, filed 

31 on even date herewith, the contents of which are 
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1 incorporated by reference. However, any suitable 

2 shape processor system or process may be employed 

3 without departing from the scope of the invention. 



4 As discussed above, the process 8 depicted in Figure 

5 1 can be realized as a software component operating 

6 on a data processing system such as a hand held 

7 computer, a mobile telephone, set top box, facsimile 

8 machine, copier or other office equipment, an 

9 embedded computer system, a Windows or Unix 

10 workstation, or any other type of 

11 computer /processing platform capable of supporting, 

12 in whole or in part, the document processing system 

13 described above. In these embodiments, the system 

14 can be implemented as a C language computer program, 

15 or a computer program written in any high level 

16 language including C++, Fortran, Java or Basic. 

17 Additionally, in an embodiment where 

18 microcontrollers or DSPs are employed, the systems 

19 can be realized as a computer program written in 

20 microcode or written in a high level language and 

21 compiled down to microcode that can be executed on 

22 the platform employed. The development of such 

23 systems is known to those of skill in the art, and 

24 such techniques are set forth in Intel® StrongARM 

25 processors SA-1110 Microprocessor Advanced 

26 Developer ' s Manual . Additionally, general 

27 techniques for high level programming are known, and 

28 set forth in, for example, Stephen G. Kochan, 

29 Programming in C, Hayden Publishing (1983). It is 

30 noted that DSPs are particularly suited for 

31 implementing signal processing functions, including 
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1 preprocessing functions such as image enhancement 

2 through adjustments in contrast, edge definition and 

3 brightness. Developing code for the DSP and 

4 microcontroller systems follows from principles well 

5 known in the art. 

6 Accordingly, although Figs. 1 and 2 graphically 

7 depicts the computer process 8 as comprising a 

8 plurality of functional block elements, it will be 

9 apparent to one of ordinary skill in the art that 

10 these elements can be realized as computer programs 

11 or portions of computer programs that are capable of 

12 running on the data processing platform to thereby 

13 configure the data processing platform as a system 

14 according to the invention. Moreover, although Fig. 

15 1 depicts the system 10 as an integrated unit of a 

16 document processing process 8 and a display device 

17 26, it will be apparent to those of ordinary skill 

18 in the art that this is only one embodiment, and 

19 that the systems described herein can be realized 

20 through other architectures and arrangements, 

21 including system architectures that separate the 

22 document processing functions of the process 8 from 

23 the document display operation performed by the 

24 display 26. Moreover, it will be understood that 

25 • the systems of the invention are not limited to 

26 those systems that include a display or output 

27 device, but that the systems of the invention will 

28 encompass those processing systems that process one 

29 or more digital documents to create output that can 

30 be presented on an output device. However, this 

31 output may be stored in a data file for subsequent 
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1 presentation on a display device, for long term 

2 storage, for delivery over a network, or for some 

3 other purpose than for immediate display. 

4 Accordingly, it will be apparent to those of skill 

5 in the art that the systems and methods described 

6 herein can support many different document and 

7 content processing applications and that the 

8 structure of the system or process employed for a 

9 particular application will vary according to the 

10 application and the choice of the designer. 

11 From the foregoing, it will be understood that the 

12 system of the present invention may be "hard-wired"; 

13 e.g. implemented in ROM and/or integrated into ASICs 

14 or other single-chip systems, or may be implemented 

15 as firmware (programmable ROM such as flashable 

16 ePROM) , or as software, being stored locally or 

17 remotely and being fetched and executed as required 

18 by a particular device. Such improvements and 

19 modifications may be incorporated without departing 

20 from the scope of the present invention. 

21 Those skilled in the art will know or be able to 

22 ascertain using no more than routine 

23 experimentation, many equivalents to the embodiments 

24 and practices described herein. For example, the 

25 systems and methods described herein may be stand 

26 alone systems for processing source documents 11, 

27 but optionally these systems may be incorporated 

2 8 into a variety of types of data processing systems 

29 and devices, and into peripheral devices, in a 

30 number of different ways. In a general purpose data 
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1 processing system (the "host system"), the system of 

2 the present invention may be incorporated alongside 

3 the operating system and applications of the host 

4 system or may be incorporated fully or partially 

5 into the host operating system. For example, the 

6 systems described herein enable rapid display of a 

7 variety of types of data files on portable data 

8 processing devices with LCD displays without 

9 requiring the use of browsers or application 

10 programs. Examples of portable data processing 

11 devices which may employ the present system include 

12 "palmtop" computers, portable digital assistants 

13 (PDAs, including tablet-type PDAs in which the 

14 primary user interface comprises a graphical display 

15 with which the user interacts directly by means of a 

16 stylus device) , internet-enabled mobile telephones 

17 and other communications devices. This class of 

18 data processing devices requires small size, low 

19 power processors for portability. Typically, these 

20 devices employ advanced RISC-type core processors 

21 designed in to ASICs (application specific 

22 integrated circuits) , in order that the electronics 

23 package is small and integrated. This type of 

24 device also has limited random access memory and 

25 typically has no non-volatile data store (e.g. hard 

26 disk) . Conventional operating system models, such 

27 as are employed in standard desktop computing 

28 systems (PCs), require high powered central 

29 processors and large amounts of memory to process 

30 digital documents and generate useful output, and 

31 are entirely unsuited for this type of data 

32 processing device. In particular, conventional 
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1 systems do not provide for the processing of 

2 multiple file formats in an integrated manner. By 

3 contrast, the systems described herein employ common 

4 processes and pipelines for all file formats, 

5 thereby providing a highly integrated document 

6 processing system which is extremely efficient in 

7 terms of power consumption and usage of system 

8 resources. 

9 The system of the invention may be integrated at the 

10 BIOS level of portable data processing devices to 

11 enable document processing and output with much 

12 lower overhead than conventional system models. 

13 Alternatively, these systems may be implemented at 

14 the lowest system level just above the transport 

15 protocol stack. For example, the system may be 

16 incorporated into a network device (card) or system, 

17 to provide in-line processing of network traffic 

18 (e.g. working at the packet level in a TCP/IP 

19 system) . 

20 The systems herein can be configured to operate with 

21 a predetermined set of data file formats and 

22 particular output devices; e.g. the visual display 

23 unit of the device and/or at least one type of 

24 printer. 

25 The systems described herein may also be 

26 incorporated into low cost data processing terminals 

27 such as enhanced telephones and "thin" network 

28 client terminals (e.g. network terminals with 

29 limited local processing and storage resources) , and 

30 "set- top boxes" for use in interactive/ internet- 
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1 enabled cable TV systems. The systems may also be 

2 incorporated into peripheral devices such as 

3 hardcopy devices (printers and plotters) , display 

4 devices (such as digital projectors), networking 

5 devices, input devices (cameras, scanners, etc.) and 

6 also multi-function peripherals (MFPs) . When 

7 incorporated into a printer, the system enables the 

8 printer to receive raw data files from the host data 

9 processing system and to reproduce the content of 

10 the original data file correctly, without the need 

11 for particular applications or drivers provided by 

12 the host system. This avoids or reduces the need to 

13 configure a computer system to drive a particular 

14 type of printer. The present system directly 

15 generates a dot-mapped image of the source document 

16 suitable for output by the printer (this is true 

17 whether the system is incorporated into the printer 

18 itself or into the host system) . Similar 

19 considerations apply to other hardcopy devices such 

20 as plotters. 

21 When incorporated into a display device, such as a 

22 projector, the system again enables the device to 

23 display the content of the original data file 

24 correctly without the use of applications or drivers 

25 on the host system, and without the need for 

26 specific configuration of the host system and/or 

27 display device. Peripheral devices of these types, 

28 when equipped with the present system, may receive 

29 and output data files from any source, via any type 

30 of data communications network. 
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1 Additionally, the systems and methods described 

2 herein may be incorporated into in-car systems for 

3 providing driver information or entertainment 

4 systems, to facilitate the delivery of information 

5 within the vehicle or to a network that communicates 

6 beyond the vehicle. Further, it will be understood 

7 that the systems described herein can drive devices 

8 having multiple output sources to maintain a 

9 consistent display using modifications to only the 

10 control parameters. Examples include, but are not 

11 limited to, a STB or in-car system incorporating a 

12 visual display and print head, thereby enabling 

13 viewing and printing of documents without the need 

14 for the source applications and drivers. 

15 From the foregoing, it will be understood that the 

16 system of the present invention may be "hard-wired"; 

17 e.g. implemented in ROM and/ or integrated into ASICs 

18 or other single-chip systems, or may be implemented 

19 as firmware (programmable ROM such as flashable 
2 0 ePROM) , or as software, being stored locally or 

21 remotely and being fetched and executed as required 

22 by a particular device. 
23 

24 Accordingly, it will be understood that the 

25 invention is not to be limited to the embodiments 

26 disclosed herein, but is to be understood from the 

27 following claims, which are to be interpreted as 

28 broadly as allowed under the law. 
29 
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1 CLAIMS 

2 1. A digital document processing system, 

3 comprising 

4 an application dispatcher for receiving an 

5 input bytestream representing source data in one of 

6 a plurality of predetermined data formats and for 

7 associating the input bytestream with one of said 

8 plurality of predetermined data formats, 

9 a document agent for interpreting said input 

10 bytestream as a function of said associated 

11 predetermined data format and for parsing the input 

12 bytestream into a stream of document objects 

13 representative of internal representations of 

14 primitive structures within the input bytestream, 

15 and 

16 a core document engine for converting said 

17 document objects into an internal representation 

18 data format and for mapping said internal 

19 representation data to a location on a display. 
20 

21 2. A digital document system according to claim 1, 

22 further comprising 

23 a shape processor for processing said internal 

24 representation data to drive an output device. 
25 

26 3. A digital document processing system as claimed 

27 in claim 1 or 2 , wherein said source data defines 

28 the content and structure of a digital document, and 

29 wherein said internal representation data describes 

30 said structure in terms of document objects of a 

31 plurality of data types and parameters defining 
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1 properties of specific instances of the document 

2 objects, separately from said content. 

3 4. A digital document processing system according 

4 to claim 3, wherein the parameters defining 

5 properties of specific instances include properties 

6 selected from the group consisting of dimensional, 

7 temporal, and physical. 

8 5. A digital document processing system as claimed 

9 in claim 3 or 4, further including a library of 

10 objects types, said internal representation data 

11 being based on the content of said library. 

12 6. A digital document processing system as claimed 

13 in any of claims 3 to 5, wherein said core document 

14 engine includes a parsing and rendering module 

15 adapted to generate an object and parameter based 

16 representation of a specific view of at least part 

17 of said internal representation data, on the basis 

18 of a first control input to said parsing and 

19 rendering module. 

20 7. A digital document processing system according 

21 to claim 6 wherein said parameter based 

22 representation includes parameters selected from the 

23 group consisting of fill, path, bounding box and 

24 transparency. 

25 8. A digital document processing system according 

26 to any of claims 5 to 7, further including a shape 

27 processing module adapted to receive said object and 

28 parameter based representation of said specific view 
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1 from said parsing and rendering module and to 

2 convert said object and parameter based 

3 representation into an output data format suitable 

4 for driving a particular output device. 

5 9. A digital document processing system according 

6 to claim 8, wherein said shape processing module 

7 processes said objects on the basis of a shape 

8 defining the shape of the object bounded by the 

9 boundary box, the data content of the object and the 

10 transparency of the object. 

11 10. A digital document processing system according 

12 to claim 8 or 9, wherein said shape processing 

13 module processes said objects on the basis of a 

14 shape defining the shape of the object bounded by 

15 the boundary box representative of a defined area on 

16 a display on which an object may be rendered. 

17 11. A digital document processing system according 

18 to any preceding claim, wherein the system employs a 

19 chrominance/ luminance-based colour model to describe 

20 colour data. 
21 

22 12 . A digital document processing system according 

23 to any preceding claim, wherein the system employs a 

24 universal text encoding model. 
25 

26 13 . A digital document processing system according 

27 to claim 12, wherein universal text encoding 

28 includes Unicode, shift-mapping and big-5. 
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1 14. A digital document processing system according 

2 to any preceding claim, further including a process 

3 for compacting an internal representation of a 

4 source document by combining document objects having 

5 similar attributes. 

6 15. A digital document processing system according 

7 to any preceding claim, further including a process 

8 for compacting an internal representation of a 

9 source document by combining document objects having 

10 similar style attributes. 

11 16. A digital document processing system according 

12 to any preceding claim, wherein the system is 

13 adapted for multiple parallel implementation for 

14 processing source data from one or more data sources 

15 and for generating one or more sets of output 

16 representation data. 

17 17 . A digital document processing system according 

18 to any preceding claim, further comprising a 

19 graphical user interface for generating internal 

20 representations of interactive visual displays to be 

21 employed by a user for controlling the digital 

22 document processing system. 

23 18. A digital document processing system according 

24 to claim 17, comprising a data processing device 

25 incorporating a graphical user interface. 



26 
27 
28 



19. A digital document processing system according 
to any preceding claim, having a platform adapted 
for being embedded into a device selected from the 
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group consisting of a hand held computer, a mobile 
telephone, a set top box, a facsimile machine, a 
copier, an embedded computer system, a printer, an 
in-car system and a computer workstation. 

20. A digital document processing system according 
to any preceding claim, having a processor including 
a core processor system. 

21. A digital document processing system according 
to claim 20, wherein said core processor is a RISC 
processor. 

22. A digital document processing system according 
to any preceding claim, wherein the document agent 
includes an export process for exporting data in a 
selected format. 

23. A digital document processing system according 
to any preceding claim, adapted for operating on a 
multiple processing system. 

24. A method for displaying content, comprising 
receiving a source of data representative of 

the digital content having a structure and data 
content, 

processing the source of data to identify a 
file format associated therewith, 

translating the source of data, as a function 
of its identified file format, into an internal 
representation that includes a first data structure 
for storing information about the structure of the 
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1 digital content, and a second data structure for 

2 storing information about the data content contained 

3 in the digital content, 

4 generating a content file representative of an 

5 internal representation of content to be presented 

6 to a user, by processing the first data structure to 

7 determine a structure for a portion of the content 

8 file and by processing the second data structure to 

9 determine data content for the respective portion of 

10 the content file. 

11 25. A method according to claim 24, wherein 

12 receiving a source of data includes receiving a 

13 stream of input data from a data source. 

14 26. A method according to claim 25, wherein the 

15 data source is selected from the group consisting of 

16 a data file, a byte stream generated from a 

17 peripheral device, and a byte stream generated from 

18 a data file. 

19 27. A method according to claim 25 or 26, wherein 

20 processing the source of data includes 

21 presenting information about the source of data to a 

22 plurality of document agents, each being capable of 

23 translating a data source of a known file format 

24 into the internal representation. 

25 28. A method according to any of claims 24 to 27, 

26 wherein 

27 translating the source of data into an internal 

28 representation includes processing the source of 

29 data to identify data therein, and mapping the 
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1 identified data to a set of object types 

2 representative of types of content that are present 

3 in a source of data. 
4 

5 29. A method according to claim 28, wherein mapping 

6 includes mapping identified data to a set of object 

7 types suitable for translating source data 

8 representative of a content selected from the group 

9 consisting of a digital document, an audio /visual 

10 presentation, a music file, an interactive script, a 

11 user interface file and an image file. 

12 30. A method according to any of claims 24 to 29, 

13 wherein mapping includes mapping the identified data 

14 to a set of object types including a bitmap object 

15 type, a vector graphic object type, a video type, an 

16 animation type, a button type, a script type and a 

17 text object type. 

18 31. A method according to any of claims 24 to 30, 

19 wherein translating the source of data includes 

20 filtering portions of the source data to create a 

21 filtered internal representation of the source 

22 document. 

23 32. A method according to any of claims 24 to 31, 

24 wherein translating the source of data includes 

25 altering the first data structure to adjust the 

26 structure of the digital content. 

27 33. A method according to any of claims 24 to 32, 

28 wherein translating the source of data includes the 
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1 further act of substituting data content in the 

2 second data structure to thereby modify content 

3 presented within the internal representation. 

4 34. A method according to any of claims 24 to 33, 

5 wherein translating the source of data includes 

6 translating the source of data into a set of 

7 document objects of known object types, wherein a 

8 document object includes a set of parameters that 

9 define dimensional, temporal and physical 
10 characteristics of the document object. 
11 

12 35. A method according to any of claims 24 to 34, 

13 wherein the process is adapted for running on 

14 multiple processors. 
15 

16 36. A method according to any of claims 24 to 35, 

17 wherein the process provided a text encoding 

18 process, for encoding in a format selected from the 

19 group consisting of Unicode, shift-mapping and big- 

20 5. 

21 37. A method according to any of claims 24 to 36, 

22 wherein generating a content data file includes 

23 parsing a set of document objects having associated 

24 parameters, to define a structure and content for 

25 the content data file. 



26 
27 
28 



38. A method according to claim 37, further 
including processing the structure and content of 
the content data file to create a set of objects 
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1 that define the content data file and are capable of 

2 being rendered on an output device. 

3 39. A method according to claim 37 or 38, wherein 

4 processing the document objects includes processing 

5 the associated parameters for flowing content into a 

6 structure defined by the document object. 

7 40. A method according to claim 38 or claim 39 when 

8 dependent on claim 38, wherein the output device 

9 includes a display selected from the group 

10 consisting of a visual display, an audio speaker, a 

11 video player, a television display, printer, disc 

12 drive, network, and an embedded display. 
13 

14 41. A system for interacting with content in a 

15 digital document, comprising 

16 a document agent for converting content in the 

17 digital document into a set of document objects 

18 representative of internal representations of 

19 primitive structures, and 

20 a core document engine for rendering said 

21 document objects to generate a display 

22 representative of the digital content, 

23 a user interface for detecting input signals 

24 representative of input for modifying the content of 

25 the digital document, and 

26 a process for changing the internal 

27 representation of the content as a function of the 

28 input signals, to modify the display of the digital 

29 content. 
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1 42 . A system according to claim 41, wherein the 

2 user interface includes an input device selected 

3 from the group consisting of a mouse, a touch pad, a 

4 touch screen, a joy stick, a remote control and a 

5 keypad . 
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